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Abstract 

We present a scheme for playing quantum repeated 2x2 games based on the 
Marinatto and Weber's approach [lj to quantum games. As a potential applica- 
tion, we study twice repeated Prisoner's Dilemma game. We show that results 
not available in classical game can be obtained when the game is played in the 
quantum way. Before we present our idea, we comment on the previous scheme 
of playing quantum repeated games proposed in [2j. We point out the drawbacks 
that make results in [2] unacceptable. 



1 Introduction 

The Marinatto- Weber (MW) idea of quantum 2x2 games introduced in pQ has found 
application in many branches of game theory. The MW approach to evolutionary games 
[3] and Stackelberg equilibrium [4J are merely two of many applications. In the papers 
[5] and [6] we have shown that the MW idea is applicable as well to finite games in 
extensive form. Consequently, this scheme of playing quantum games can be applied 
to many other game-theoretical problems. In this paper we deal with the problem 
of quantization of twice repeated 2x2 games. Since a finitely repeated game is just 
a particular case of a finite extensive game, we apply the method based on [5] and [6] to 
play the repeated game in the quantum way. The idea of quantum repreated games was 
first introduced in [2j, where the Authors adapt the MW scheme for the twice repeated 
Prisoner's Dilemma. Then, they investigate if results that are unavailable when the 
game is played classically can occur in the quantum area. The point of the paper [2] is 
to provide sufficient conditions for players' cooperation in the Prisoner's Dilemma game. 
We examine the idea of [2J before we define our scheme. Firstly, we study the problem 
of cooperation considered by the Authors of [2] and we prove that player's cooperation 
in the game defined by the protocol proposed in [2] is not possible. Secondly, we check 
whether that scheme is actually in accordance with the concept of repeated game. As we 
will show, the discussed scheme does not include the classical twice repeated Prisoner's 
Dilemma, hence it cannot be the quantum realization of this game in the spirit of the 
MW approach. To support our arguments we propose the new protocol for a twice 
repeated 2x2 game and prove that our idea generalizes the classical twice repeated 
game. Our paper also contains the proof of the advantage of the quantum scheme over 
the classical one: We prove that both players can benefit from playing game via our 
protocol. Moreover, we show that contrary to the situation encountered in the classical 
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Figure 1: The strategic a) and extensive b) form of the Prisoner's Dilemma. 

game, the cooperation of the players is possible for some sort of Prisoner's Dilemma 
games played repeatedly when our quantum approach is used. 

Studying our paper requires little background in game theory. All notions like extensive 
game, information set, strategy, equilibrium, subgame perfect equilibrium etc. used in 
the paper are explained in an accessible way, for example, in [7] and [Sj. The adequate 
preliminaries can also be found in the paper [5], where quantum games in an extensive 
form are examined. 



2 Twice repeated games and the Prisoner's Dilemma 

The Prisoner's Dilemma (PD) is one of the most fundamental problems in game theory 
(the general form of the PD according to |9j is given in Fig. QJa). It demonstrates why 
the rationality of players can lead them to an inefficient outcome. Although the payoff 
vector (R,R) is better to both players than (P,P), they cannot obtain this outcome 
since each player's strategy C (cooperation) is strictly dominated by D (defection). As 
a result, the players end up with payoff P corresponding to the unique Nash equilibrium 
(D,D). 

A similar scenario occurs in a case of finitely repeated PD. The concept of a finitely 
repeated game assumes playing a static game (a stage of the repeated game) for a fixed 
number of times. Additionally, the players are informed about results of consecutive 
stages. In the twice repeated case it means that each player's strategy specify an action 
at the first stage and four actions at the second stage where a particular action is chosen 
depending on what of the four outcomes of the first stage has occurred. It is clearly 
visible when we write twice repeated game in the extensive form (see Fig. [2]). The first 
stage of the twice repeated PD in the extensive form is simply the game depicted in 
Fig. d](b) where the players specify an action C or D at the information set 1.1 and 2.1, 
respectively (the information sets of player 2 are distinguished by dotted line connecting 
the nodes to show lack of knowledge of the second player about the previous move of 
the first player). When the players choose their actions, the result of the first stage is 
announced. Since they have knowledge about the results of the first stage, they can 
choose different actions at the second stage depending on the previous result, hence the 
next four game trees from Fig. [I] are required to describe the repeated game. The game 
tree exhibits ten information sets (five for each player labelled 1.- and 2.-, respectively) 
at which each of the players has two moves. Thus, each of them has 2 5 = 32 strategies 
as they specify C or D at their own five information sets. 
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1.1 




Figure 2: The extensive form for a twice repeated Prisoner Dilemma. 



To find the Nash equilibrium in finitely repeated game it is convenient to use the 
property that the equilibrium profile always implies the Nash equilibrium at the last 
stage of the game. Therefore, to find the Nash equilibrium in the twice repeated PD it 
is sufficient to consider strategy profiles that generate the profile (D, D) at the second 
stage. Then it follows that D is the best response for players at the first stage as well. 
By induction it can be shown that playing the action D at each stage of finitely repeated 
PD constitutes the unique Nash equilibrium. It is worth noting that if a single stage of 
repeated game has more than one equilibrium, different Nash equilibria may be played 
at the last stage depending on results of previous stages. For example, let us consider 
the Battle of the Sexes (BoS) game given by the following bimatrix: 



r- 

F 



O F 

(<*,P) (7,7) 
(7,7) (/?,«) 



where a > (3 > 7. 



It has two pure Nash equilibria, namely, {0,0) and (F,F). Let us examine now the 
twice repeated BoS. Obviously, its game tree is the same as one in Fig. [2j Let us assign 
appropriate sum of two stage payoff outcomes to each possible profile (like it has been 
done in the case of the twice repeated PD). Then we find many different Nash equilibria. 
One of these is to play the Nash equilibrium (0, O) at the first stage, keep playing (O, O) 
at the second stage if the outcome of the first one is (O, O) or (O, F), otherwise to play 
stage-game Nash equilibrium (F,F). 

3 Comment on 'Quantum repeated games' by Iqbal 



and Toor 2 



Let us remind the MW approach to playing the PD repeatedly introduced in [2|. Ac- 
cording to this concept the two-stage PD is placed in Jf = (C 2 )® complex Hilbert 
space with the computational basis. The game starts with preparing 4-qubit pure state 
represented by a unit vector in Jff. The general form of this state is described as follows: 



\x 1 ,X 2 ,X 3 ,X 4 ), 

£1, £2, £3, £4=0,1 

where A XljX2jX3j:r4 e C and £ \X X1 



XI, X2, X3, X4=0,l , . 

2 1 



X2,X3,X4 I 



a;i,a;2,a;3,X4=0,l 
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Players' moves are identified with the identity operator <7o and the bit flip Pauli operator 
oi. Player 1 is allowed to act on the first and third qubit, and player 2 acts on the second 
and fourth one. In the first stage of the game the two first qubits are manipulated. Let 
p in be the density operator for the initial state d2J). Then the state p after the players' 
actions takes the form 



P 



and £ p M = £ fa = 1. ( ' 

Kl=0,l « 2 =0,1 

where p K1 (g K2 ) is the probability of applying cr\ ( a K 2 ) ^° ^ ne ^ rs ^ ( secon d) qubit. Next, 
the other two qubits are manipulated which, according to Iqbal and Toor, corresponds 
to the second stage of the classical game. The operation a^ 3 on the third qubit with 
probability p K:j and operation cr^ 4 on the fourth qubit with probability q Ki change the 
state p to 



K3,K4=0,1 

and ^ p K3 = ?m = 1. 

K3=0,l K4=0,l 



(4) 



The next step is to measure the final state p nn in order to determine final payoffs. The 
measurement is defined by the four payoffs operators Xi.j, i,j = 1,2 associated with 
particular: player % and stage j. That is 



X L1 = (P|00)(00| + 5|01)(01| + T|10)(10| + P|ll)(ll|) <g> t® 2 
X L2 = I 02 ® (P|00)(00| + S|01)(01| +T|10)(10| + P|ll)(ll|) 
X 2 .i = (P|00)(00| +T|01)(01| + 5|10)(10| + P|11)(11|) ® I® 2 
X 2 . 2 = I 02 ® (P|00)(00| + T|01)(01| + S|10)(10| + P|ll)(ll|). 



(5) 



Then the expected payoff Ei,j for player i at stage j when player 1 chooses strategy 
(o"L) cr « 3 ) an d pl a y er 2 chooses (o" 2 2 ,cr^ 4 ) is obtained by the following formula: 

Ei.j («,<), «, <)) = tr(X M p fin ). (6) 
The authors took up the issue of cooperation in two-stage PD. Given the initial state 



l^in) = AqoooIOOOO) + AoonlOOll) + A 1100 |H00) + A lnl |llll) (7) 
and fixed payoffs 

T = 5, P = 3, P = l, S = 0, (8) 

they identify <jq and o\ as actions of cooperation and defection, respectively, and claim 
that conditions 

|Aoooo| 2 + |Aoon| 2 < g> |Aoon| 2 + |Aim| 2 < — (9) 

are sufficient to choose o"o by both players (thereby cooperating) at the first stage given 
that the players have chosen o\ at the second one. We raise below two objections 
concerning the results of the paper [2]. 
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3.1 The incompatibility of the protocol (T2I)- (EJ) and theory of 
repeated games 

The main fault of the protocol ©-© is that the twice repeated game cannot be de- 
scribed in this way. In fact, it quantizes the game PD played twice when the players 
are not informed about a result of the first stage. It is noticeable, for example, when we 
re-examine the way of finding the optimal solution provided in [2] . The authors analyze 
the game backwards, first by focusing on the Nash equilibria at the second stage. They 
set condition for the profile (of, af) to be the Nash equilibrium at the second stage. 
Next, given that (erf, af) is fixed, they determine the set of amplitudes for which the 
profile ((cq,Oo), ( 'i, cr i)) is the Nash equilibrium of the game implied by fl2J)-(jni). This 
method to find the Nash equilibria is not correct since it doesn't include the possibility 
that players make their actions depending on a result of the first stage. Although the 
problem seems to be insignificant where a stage of a repeated game has unique Nash 
equilibrium, it becomes visible in remaining cases. Let us consider the initial state (JZJ) 
satisfying the requirement 

2 ( | Aoooo 1 2 + |^iioo| 2 ) — I -^ooii 1 2 + I -^liii 1 2 (10) 

and let us take © to be PD's payoffs. Then, the expected payoffs for the players at the 
second stage of the game defined by the scheme ©-(jSJ) are as follows: 

MWM'.'o)) =MW),(-,*o)) = (■><#) =5|A|; 

E 1 . 2 ((;al),(;at)) =£ 2 . 2 ((-,^),(.,a 4 )) = 10|A|; (11) 

^. 2 ((-,a 1 3 ),(-,^)) =7|A|, 

where |A| = |A ooo| 2 + |Anoo| 2 an d i — 1,2. Results of (jTTJ) imply continuum of Nash 
equilibria in the second stage (it is easy to note, for example, when we draw a 2 x 2 
bimatrix with entries defined by (ITT]) ), among them ((-,(Tq), {',crf)) and ((-,af), (-,(Tq)). 
Bearing in mind the remark in Section [2] about possible profiles in the BoS game, the 
correct protocol for quantum repeated games should be able to assign a payoff outcome 
(by the measurement (J5J)) to a strategy profile, where different Nash equilibria are played 
at the second stage depending on actions chosen at the first one. However, an example of 
a profile where the players play ((■, Oq), (•, af)) at the second stage if a result of the first 
stage is ((cq, ■), (<Tq, •)), and they play ((•, af), (•, Cq)) in other cases cannot be measured 
by the scheme (jSJ)-®- Since there is two qubit register allotted to the second stage, it 
allows to write only one pair of actions (c^ , a^J before the measurement is made. 

An argument against the scheme in [2] can be expressed in another way. Namely, all 
results included in J2] can be obtained by considering simplified protocol dZ])-© where 
the sequential procedure dHJ) and (jl]) for determining the final state pfi n is simply replaced 
with 

4 4 

Pfin = (g)<Pin(g)<, (12) 

i=i j=i 

In this case, the first and the second player simultaneously pick (a* , a^ 3 ) and {a^ 2 , a^J, 
respectively, having essentially only four strategies each. However, as we mentioned in 
the previous section, each player has 32 strategies in the classical twice repeated game. 
As a result, the protocol ©-© cannot coincide with the classical case if \ipi n ) = |0000). 
Despite the fact that the Authors assume that a player knows her opponent's action 
taken previously, the scheme ©-© does not take it into consideration. In consequence, 
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a game being quantized by (T2])-([6]) differs from the game in Fig. [2] in that the nodes 1.2, 
1.3, 1.4 and 1.5 (2.2, 2.3, 2.4 and 2.5) lie at the same information set (are connected 
with dotted line). 

3.2 The misconception about the cooperative strategy in the 
PD played via the MW approach 

The another fault, we are going to discuss, is based on misinterpreting the operators o"o 
and o\ as cooperation and defection in the protocol given by ([2])- Q). Let us consider the 
initial state \ipi n ) where the two first qubits associated with the first stage are prepared in 
the state \xi, x 2 ), for x\, x 2 G {0, 1}. Then the first stage of the game given by ©-(ED is 
isomorphic to the classical PD game. When the initial state is 1 00) then cq corresponds 
to the action C and o\ corresponds to D. However, when the initial state is |11), the 
action 'cooperate' are identified with a\ and the action 'defect' with cxo since by putting 
Pfin = ® °"K 2 )|ll)(ll|(° r K 1 ® a l 2 ) into the formula (jSJ) we have 

r (R,R), if (« 1s k 2 ) = (1,1); 

(tr(X n ) fr(X n n-J ^' T ^' if ( K b ^) = (1, 0); , , 

( (P,P), if (« 1 ,« 2 ) = (0,0). 

That is, the outcome of the game does not depend intrinsically on the operators but 
depends on the initial state and on what the final state pg n can be obtained through 
the available operators. Thus identification of operators with actions taken in classical 
game without taking into consideration the form of the initial state is not correct. The 
misidentification assumed in [2] implies that the condition (J9j) cannot solve the problem 
formulated in this paper. It is clearly visible when we take, for example, the initial state 
|^i n ) = 1 1100). It satisfies the inequalities ([9]) thus, a is optimal at the first stage for 
each player. In fact, do is the action 'defect' as it is shown in (TT3"j) . Note also that the 
payoff corresponding to the profile (00,00) at the first stage and (01,01) at the second 
one is 2P for each player - total payoff for the defection. Thus, the condition (j^J) does 
not ensure the cooperation at the first stage. 

Quite the opposite, it turns out that the players never cooperate when they play the 
game defined by (J2D-(jni) • Let us consider any initial state ([5]) in which the first and the 
second qubit are prepared in a way that for (si, s 2 ) = ((0^, 0« 3 )) (0^5 cr K 4 )) we have 

r if (ki,«2) = (0,0); 

[ (P',P'), if (« lj « 2 ) = (l,l). 

where the values T", R', P', S' meet the requirements of the PD given in Figfjja), so the 
operators a Q and <j\ can be regarded as cooperation and defection, respectively. Next, 
let us estimate the difference 

£i((^iVo),s 2 ) - #i((0o, 00)^2) for any s 2 = (0« 2 ,0^ 4 ), (15) 

where Pi = P1.1 + Pi. 2 . Since the same actions are taken on the third and the fourth 
qubit, we have Pi. 2 ((0q, 0q), s 2 ) = E\. 2 ({g\, 0q), s 2 ), therefore, the value Pi depends only 
on P1.1, Thus, for s 2 = (0^ 2 ,0« 4 ), we obtain from ( 1T4I) that 

< Pi((0l, O 3 ), s 2 ) - E 1 ((al, O 3 ), s 2 ) = I T p,-_ R g, ' || 2 I J] (16) 
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In similar way we can prove that the strategy (cr^of) of player 1 is strictly dominated 
by (a\, erf). As a result, we conclude that <j\ is the best response of player 1 at the first 
stage. Symmetry of payoffs in PD implies that strategy (erg, ctq) of player 2 is strictly 
dominated by (erf, erg), as well as (<7q, erf) is strictly dominated by (af,af). Thus, there 
is no Nash equilibrium in which the players choose o"o (cooperation) at the first stage. 

4 The MW approach to twice repeated quantum 
games 

In this section we propose a scheme of playing a twice repeated 2x2 quantum game 
that is free from the faults we have pointed in the previous section. Our construction 
is based on the protocol that we proposed in [5] where general finite extensive quantum 
games were considered. Since a repeated game is a special case of an extensive game, 
we can adapt this concept. Next, we examine what results can be obtained from such 
protocol. In particular, we re-examine the problem of cooperation studied in [2]. 

4.1 Construction of a twice repeated 2x2 quantum game via 
the MW protocol 

Let us consider a 2 x 2 game defined by the outcomes LltL2 , Li,l 2 = 0,1. The twice 
repeated 2x2 quantum game played according to the MW approach is as follows: 
Let = (C 2 r be a Hilbert space with the computational basis {\xi,X2, ■ ■ ■ , £io)}, 
Xj = 0, 1. Then the initial state of the game is a ten-qubit pure state represented by 
a unit vector in the space Jtf: 

2 10 -1 

l^m) = J2 A *I X )' fOT A * G<D aIld 5Zl A *! 2 = 1 ' 

x=0 x 

where the sum is over all possible decimal values of x — (x) w = (xix 2 . . .£10)2- The 
players are allowed to apply operators cr and a%. The qubits with odd indices are manip- 
ulated by player 1 and the qubits labelled by even indices are manipulated by player 2. 
Such assignment implies 32 possible strategies for each players as they specify five opera- 
tions a J K . (where j and Kj indicate qubit number and operation number, respectively) on 
their own qubits. We denote a player z's strategy by n = (< 8 , <+ + 2 2 , <+ + 4 4 , <+ + 6 g , <+ + 8 g ), 
where i — 1, 2. The profile t — (ri, T2) gives rise to the final state: 

10 

l^n) = (g)<|^in). (18) 
3=1 

If the players each take r\ and rj with probability p t and q t ', respectively, that corre- 
sponds to the state 1^'* ) (defined by (fl8|) ) with probability Ptqt', then the final state is 
the density operator associated with the ensemble {ptQt 1 , l^fin)}- That is 

P^ = J2p t qt'\^Z)(^Z\- ( 19 ) 

t,t> 

Till now, a difference between the concept in |2] and our protocol lies in the dimension 
of the space M' . The next difference is clearly visible in a description of measurement 
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(21) 



operators. The measurement on p^ n that determines an outcome of the game is de- 
scribed by a collection {X±, X 2 .oo, X 2 .oi, -^2.10, X2.11}, where its components are defined 
as follows: 

X x = XltX2 \x l ,x 2 )(x l ,x 2 \®t m ] (20) 

a:i,^2G{0,l} 

^2.00= °-3,J00>(00|® \x 3 ,x A )(x 3 ,x4®t m ; 

:E3,X4€{0,1} 

X 2m = O X5)X6 \01){01\®t® 2 ® \x 5 ,x 6 )(x 5 ,x 6 \®t m ; 

x 5 ,x 6 e{o,i} 

X 2 .w= X7<X8 \lO)(lO\®t® 4 ® \x 7 ,x 8 )(x 7 ,x 8 \®t® 2 ; 

X 2 .n= J2 O XgiX10 \ll}{U\®t® 6 ®\x 9 ,x 10 }{x g ,x l0 \. 

a:9,^ioG{0,l} 

Then the expected outcomes: Ei.\ at the first stage and Ei_ 2 at the second stage are 
calculated by using the following formulae: 

E lA = tv(X lP&n ), E L2 = tr lj2 X ^ 2 P&n I • (22) 

Let us give justification of our construction. Notice that 2 10 is a minimal dimension of 
the space J4? in order to play the twice repeated 2x2 game. Since a player's strategy 
in a twice repeated 2x2 game specifies action at the first stage and at each of four sub- 
games fixed by the outcome of the first stage, the quantum protocol needs a five-qubit 
register to write a player's strategy. The first two qubits are used to perform operations 
at the first stage of the repeated game. Then given the form of X\ and strategies of 
players restricted to manipulate the first and the second qubit, in fact, the protocol 
(I17 p - (j22p coincides with the MW scheme of playing 2x2 quantum game [TJ. The re- 
maining eight qubits are used to define players' moves at the second stage. That is, by 
pairing consecutive qubits from the third qubit onwards, actions at the second stage are 
defined on appropriate pair of qubits depending on the outcome at the previous stage. 
For example, given the outcome O w has occurred at the first stage (that is the outcome 
10 on the first two qubits has been measured), the expected outcome E i 2 depends only 
on operation on x 7 and x%, i.e, E i 2 = tr(X 2 .ioPfin)- Then the players play the second 
stage in the same way as in the protocol ([2])-®- However, contrary to the previous idea, 
each player specifies her move for each possible outcome XltX2 . 



A game generated by our scheme naturally coincides with the classical case when appro- 
priate initial state is prepared. We prove this fact by means of a convenient sequential 
approach to ( 1T7)) -( I2"2"|) provided in the next section. 



4.2 Extensive form of a quantum twice repeated 2x2 game 

The protocol ( 1TT|) -( 12"2"|) allows to put a game into an extensive form by using a similar 
method to what was described in [5]. The extensive form is obtained through sequential 
calculating the final state pg n according to the following procedure. At first the players 
manipulate the first pair of qubits. Then the measurement in the computational basis is 
made on these qubits (as a result, an outcome O il)t2 of the first stage is returned). The 
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measured outcome is sent to the players. Depending on the measurement outcome t\ } i 2 
that occurs with probability i-i) the players act on the next pair of qubits: if ii, i 2 
is observed then player 1 and player 2 manipulate qubits 2i + 3 and 2i + 4, respectively, 
where 1 = (4^2)2 is a decimal representation of a binary number l 1 l 2 . The procedure 
can be formally described as follows: 

Algoritm 4.1 

!■ ® cr K 2 )l^ in ) = 1^) The players perform their operations and cr^ 2 

on the initial state iV'in}- 

M lul2 \ip) 

2. — >■ — ' = |V'ti ) t2/ The first two qubits in the state p are measured. 

Y\0|-^n,t2 |Vv The measurement is described by a collection 

{M L1 . L2 : M L1 . L2 = |ii,i 2 >(ti,i. 2 | ® 1® 8 ,H,*2 = 

0,1}. 

3. — >■ 62)) (^Katfa ® ^hI) I^Wa)} Given that the outcome li,l 2 has been observed, 

P (l 1 ,l 2 ) = (ip\M LuL2 \i>) players f and 2 perform operations ct^+ + 3 3 and 

CT K2^+4 on * ne post-measurement state. 

It turns out that we can prove 

Proposition 4.2 The density operator \^sn) (^sn\ associated with state / fl^j) and i/ie 
density operator for the ensemble {p(ti, 12), ip^f z ® ^ ill^i^)} ^ n Algorithm J^.l 



determine the same outcomes E iA and E L2 with regard to the measurement (E^)-(TJip. 

Proof. Let us put p = |^)v0l- Given that I^WaKVWal = M^pM^/p^iv) the 
state pg n can be written as: 

Pfin= e <: + ><:>-^^. 2 «3®<: + 4 4 - (23) 

ll, (.2=0,1 

Since the first and the second qubits are measured, any operation a 3 ., for which j ^ 1,2 
does not influence the measurement. Therefore we have 

PL= E M^o™®o™po™®o*£M^. (24) 

ti, 12=0,1 

Note that Xj , L > 2 M LlM = 8 hl /X L i j , where 6~ Ljl i is the Kronecker's delta, and i = (41,42)2, 
and 1' = 4)2, Using the form (1241) of p' &n we have 



tr (E X M/Xn) = tr (e x ^, 2 <: + 3 3 ® * 



For each t the trace of each term of the sum on the right-hand side of equation (125]) 
depends only on an operation a 3 ., on a qubit j, where j G {1, 2, 24 + 3, 24 + 4}. Thus, 
the equation (j2"5]) holds when also the rest of operations a 3 ,, are added: 

tr (^2x 2 ., uL2 pi &1 )j = tr ^E X ^ 1)t2 ® < j • (26) 
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As a result, the left-hand side of (|26|) is equal to the expected outcome Ei, 2 associated 
with the final state | / 0fi n )(V ; fin|- To prove that p' &n also determines the expected outcome 
En let us see that X 1 and {M L1L2 } are the same projective measurement up to the 
eigenvalues. Hence 

tr GMJ = tr ® ® ■ ( 27 ) 

Since p = cr^ <g> c^AnO"^ ® c« 2 , we obtain 

(10 10 \ 

•V,(g)</^(g)< • (28) 
i=l i=i / 

Equations (|2"oT) and (I2"gj) show that the state determined by the sequential procedure 
and state (j!8p set the same outcomes E^i and for i = 1,2. Using the same way as 
above and the linearity of the trace it can be proved that the equivalence is true if the 
players pick nondegenerate mixed strategies as well. □ 

Having a sequential approach that is in conformity with protocol ()17[) - (|22p we are able to 
analyze a quantum repeated game through an extensive form. It can facilitate the work 
significantly bearing in mind 32 x 32 bimatrix associated with the normal representation 
of twice repeated 2x2 game. Let us study the game tree drawn from the sequential 
procedure if the initial state (fl7|) takes the form 

|^in) = A |0)® 10 + A 1 |l) 5510 . (29) 

Let us use the sequential procedure step by step. At first the players manipulate 
and cr 2 2 . Hence we obtain the following state: 

<®<l^in) = A |«: 1 ,/ t2 )|0)^ 8 + A 1 |«; 1 ,«; 2 )|l)® 8 , (30) 

where Kj is the negation of Kj. A game tree at this phase is just the game tree corre- 
sponding to a 2 x 2 game (see Fig. DJb)), where for j = 1, 2, Kj = 0, 1 are associated 
with respective branches of that game tree. After a sequence of actions (cL,cr^ 2 ) the 
measurement {M Ll jt2 } is made. Let us focus on the cases when the measurement outcome 
00 or 11 has been observed. The form of f l30|) tells us that the measurement outcomes 
00 and 11 are possible only if the profile at the first stage takes the form of (cr^u 2 ), 
where k — 0, 1. Then, the probability p(00) (p(ll)) that the measurement outcome 
00 (11) will occur is equal to |A K | 2 ( | 2 ) . Thus, the game tree is extended to include 
random actions 00 and 11 with associated probabilities after the both histories (<r*, ex 2 ). 
Since further moves of the players depend only on the measurement, the pair of histo- 
ries (<Tq, (Tq, 00), {a\, af, 00) and the pair (oq, o"o> H)> ( a i> °\i H) constitute two separate 
information sets. Next, given that 00 (11) has occurred, following the sequential pro- 
cedure, the players manipulate third and fourth (ninth and tenth) qubit at the second 
stage. Therefore another extensive form of 2 x 2 is added to each sequence {a l K , o\, u), 
where k, l — 0, 1. In consequence we obtain a game tree shown in Fig. [3] (a part of the 
game tree after histories of (cr^, erf), K — 0, 1 is similar). Each outcome associated with 
a terminal sequence are determined by a pure state from the ensemble given by the 
sequential procedure. For example, after sequence {o\,a\) the post-measurement state 
takes the form of lO)® 2 !!) 08 (up to a global phase factor) with probability |Ai| 2 , and the 
players choose sequence (cr^ , 0^ ). Then the total outcome := E L1 + E i 2 associated 
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Figure 3: The extensive form for a twice repeated Prisoner Dilemma played through 
protocol (fT7|) - (j2"2"]) when the initial state is on the form of f[2"9~j) . 



with a sequence (a\, af, 00, cr^ 3 , cr^J is calculated according to formulae fl22|) : 

m = tr(Jx 1 + y £x^ 1 ^J (|0)(0|)® 2 |K3,«4)(K3,«4|(|l>(l|rJ ■ (31) 

The extensive approach allows us to see directly that our scheme coincides with the 
classical twice repeated 2x2 game when |^i n ) = |0)® 10 . Without loss of generality, 
let the outcomes tl , t2 be the payoff outcomes corresponding to the PD game. Then 
putting |A | 2 = 1 in (1291) and assuming a 3 Q := C, a{ := D the game in Fig. [3] depicts 
exactly the classical twice repeated PD game (compare Fig. [2] and Fig. [3]). 



4.3 Twice repeated PD game played by means of the protocol 

(ED-([22D 

Let us study the twice repeated PD game played with the use of our scheme. Analysis 
of our protocol with the general form initial state (1171) is a laborious task and it deserves 
a separate paper to report about. Nevertheless, we can derive many interesting features 
with less effort considering the initial state of the form 

5 

iV'in) = Wj)-, where \ipj) is a state of 2j — 1 and 2j qubit. (32) 

Let us consider first the problem of optimization of the equilibrium payoffs, given a space 
of initial states domain. 



Proposition 4.3 There are infinitely many settings of the initial state ( (77| ) for which 
the twice repeated PD game played with the use of the protocol [TT\)- ft2l^) has a unique 
subgame perfect equilibrium with the equilibrium payoff (2Q,2Q) such that Q > P. 
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Proof. Let us put the initial state ( 15!2"|) into the protocol (|17p - (|22p assuming that 
\(pj) = \(p) for any j. Then, the measurement {M tl)t2 } on the first pair of qubits does 
affect others qubits. Moreover, given that the outcome Hil2 has occurred, the expected 
outcome E^2 depends only on manipulating on one pair of qubits \<p) due to the form of 
(12TT) . Therefore, regardless of the first stage outcome O tljt2 , the players are faced with 
a 2 x 2 quantum game at the second stage (played via the MW approach). That is, the 
players are faced with the problem 

(\<p)(<p\,{<ro,<ri},Xr), (33) 

where player 1 and 2 apply operators from the set {o"o, 0^} on the first and the second 
qubit of \(p), respectively. The outcome operator X[ takes the form 

x 'i= °yi,y2\yiiV2)(yi,y2\, (34) 

2/1,2/2=0,1 

and the expected outcome is equal to 

<) = tr « ® a 2 M(v\< ® < X 'i) ■ (35) 



Obviously, the first stage game is also described exactly as the triple fl33j) . Since a quan- 
tum game according to the MW approach is a game expressed by a bimatrix, it leads 
us to the conclusion that protocol (IT7j) - (l2"2"l) with the initial state |v?)® 5 , in fact, can be 
treated as a twice repeated bimatrix game generated by (1551) . 

Let us substitute O yxm for the payoffs of the PD game in the game (I3"3"|) and examine it 
towards uniqueness of Nash equilibria. Putting a state \ip) = Ao|00) + Ai|ll), for which 
the amplitudes of \ip) satisfy the condition: 



, ^ min{T - R, P - S} 



< |A r < rr, „ TI 7T- (36) 

1 01 T-R + P-S v ; 

the inequalities 

E x (al,a 2 K2 ) > Et (al,a 2 K2 ) and E 2 «,a 2 ) > E x (a^al) (37) 

are true for any Ki,k 2 = 0,1. Inequalities ( 157)) imply the unique Nash equilibrium 
(<Tq,<Tq). Moreover, the first inequality of condition ( 155]) ensures that 

E x (al,al) = \M 2 R+ \M 2 P > P. (38) 

Since the game constructed in the proof can be regarded as a classical twice repeated 
game, we are allowed to use all facts of classical repeated game theory. One of these tells 
us that a unique stage-game Nash equilibrium implies, for any finite number of repeti- 
tions, a unique subgame perfect equilibrium in which the stage-game Nash equilibrium 
is played in every stage. This completes the proof. □ 

Of course, the protocol ()17p - ()22p can be re-formulated for any finitely repeated 2x2 game 
and then statement analogical to Proposition 14.31 can be articulated. Unfortunately, the 
number of qubits required in our protocol grows exponentially with number of stages. 
For example, in the case of a game repeated three times, the protocol (1T7)) - (122)) needs 
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next 32 qubits to describe the third stage. In general, the number of YTj=\ 2 2j 1 qubits 
is required for a 2 x 2 game repeated n times. 

We shall re-examine now the problem of cooperation considered in [2] . We demonstrated 
in Section 3 that the cooperation at the first stage is not possible in the game defined 
by the Iqbal and Toor scheme. However, we also showed that this protocol does not 
take into consideration a player's move at the second stage as a function of the first 
stage result. Therefore, in fact it does not allow to study the cooperation problem in 
a proper way. The following example proves that the cooperation of players is possible 
if the twice repeated PD game is played via our scheme. 

Example 4.4 Let us set the PD game with payoff vectors 

O 00 = (4, 4), Ooi = (0, 5), O w = (5, 0), O u = (1, 1) (39) 
inserted in ( 120]) and (|2ip . Let us also assume that the initial state ( ITT)) takes the form 

|^. n ) = |0)^ 2 (v^6|O) 02 + v^4|l) 02 ) |O> 06 . (40) 

A game specified in this way differs from the classical one only in the subgame following 
the outcome Ooo of the first stage because then E i 2 depends on operations on entangled 
third and fourth qubit. Since two first qubits in the state 1 00) imply the classical PD 
game at the first stage, we are permitted to identify the action 'cooperate' and the 
action 'defect' with a and oj, respectively, assuming C : = a and D := a\. Moreover, 
the quantum measurement after the first stage is trivialized in this case and it coincides 
with the classical observation in an extensive game. It follows that both the game 
defined by ( 1TTj) - (l2"2"]) . ( 139"]) . (HO]) and the classical game can be represented by the same 
game tree as well as the same payoff values except when 00 has been measured on the 
first pair of qubits after the first stage. Let us determine now the payoff outcomes at 
the second stage given that the post-measurement state of the first pair of qubits is 
1 00) (in other words, when player l's strategy is t\ = (c^, a^ 3 , af. 5 , a 7 K7 , a^ ) and player 
2's strategy is r 2 = (a 2 2 , a^ 4 , aj^ 6 , a^ g , a*° ) , which makes the strategy profile in the 
form (n, r 2 ) = ((oq, -,-,-,-), (ctq, -,-,-,-))). Given the initial state ( 14"0]) and the form of 
operators ([21]) . the payoff outcome E{_ 2 for each k 3 , k 4 e {0, 1} and i — 1, 2 is as follows: 

Ei.2 (K, <, •, •, •), K <, •, t))= °> 60 «3,m + 0, 40^ 4 . (41) 

The extensive form of the game with expected payoffs En + Ei.% given by (139]) is shown 
in Fig. |U Let us examine this game for subgame perfect equilibria. Such profile has to 
induce the Nash equilibrium in any subgame fixed by an outcome at first stage. In our 
case, it is a profile in which both players take a% on qubits from the third qubit onward. 
Consequently, in quest of subgame perfect equilibria, we take only the following profiles 
into consideration: 

(n,r 2 )G|<x<xn^j. (42) 

Then it turns out that the noncooperative subgame perfect equilibrium is still preserved. 
If one of the players picks o\ at the first stage, the best response of the other one is 
to pick 0i too. Therefore, the profile (r^r^) = Wj=\ constitutes a subgame perfect 
equilibrium. However, contrary to the classical twice repeated PD, there is another 
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Figure 4: The extensive form for a twice repeated Prisoner Dilemma played through 
protocol (fT7|) - (l22|) with update on the initial state (JlDj) . 

subgame perfect equilibrium (rf,^') in which each player chooses do (cooperates) at 
the first stage i.e., r" = (oq, crf , erf, cr[, erf) and r^' = (ctq, erf, cr^, erf, cr} ). Moreover, only 
the latter equilibrium is reasonable since it yields the payoff 6,2 instead of 2 for each 
player. 

Example 14.41 shows that the cooperation of players is possible when the twice repeated 
PD game is played according to our scheme. Unfortunately, the example does not solve 
this problem for any PD game. The condition 2R > T + S imposed on the payoffs allows 
to select an arbitrary large finite number T (if a sufficiently small number S is selected). 
We suppose that an appropriately large T may convince the players to defect even if the 
game is played in quantum domain. 

5 Conclusion 

Our paper proves that repeated games can be quantized. That is, we have shown that 
appropriately modified the MW scheme for 2x2 quantum games can indeed generalize 
a twice repeated game. In addition, such quantized game can be further analyzed by 
strategic as well as extensive form games. Our results also indicate (with the use of 
the twice repeated Prisoner's Dilemma) that playing repeated games in the quantum 
domain can give superior results in comparison with the classical ones. At the same 
time we have answered why the previous approach [2] cannot be treated as a correct 
protocol for quantum repeated games. The main objection is that the protocol [2] is 
unable to consider a full set of strategies available to players. In contrary to the Iqbal 
and Toor's scheme, the protocol defined in this paper is free from the mentioned fault. 
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